Collective dynamics of capacity-constrained ride-pooling fleets

Ride-pooling (or ride-sharing) services combine trips of multiple customers along similar routes into a single vehicle. The collective dynamics of the fleet of ride-pooling vehicles fundamentally underlies the efficiency of these services. In simplified models, the common features of these dynamics give rise to scaling laws of the efficiency that are valid across a wide range of street networks and demand settings. However, it is unclear how constraints of the vehicle fleet impact such scaling laws. Here, we map the collective dynamics of capacity-constrained ride-pooling fleets to services with unlimited passenger capacity and identify an effective fleet size of available vehicles as the relevant scaling parameter characterizing the dynamics. Exploiting this mapping, we generalize the scaling laws of ride-pooling efficiency to capacity-constrained fleets. We approximate the scaling function with a queueing theoretical analysis of the dynamics in a minimal model system, thereby enabling mean-field predictions of required fleet sizes in more complex settings. These results may help to transfer insights from existing ride-pooling services to new settings or service locations.

are the θ − 1 complex zeros of z θ e x(1−z) − 1 = 0 (S5) within and on the unit circle and W 0 denotes the principal branch of the Lambert W function. From the probability generating function Eq. (S3), the average queue lengthq just before the bus arrives at the node follows as Since on average x customers arrive in one service interval, the average queue length q just after the bus has departed from the node is q =q − x. From these queue lengths and the Poisson arrival process, it follows that the time-average ⟨q⟩ of the queue length is This expression captures the divergence of the queue length as the system overloads when x → θ. With this expression for the average queue length we obtain the average number of scheduled cusomters as ⟨C⟩ = x + 2 ⟨q⟩ and find the expression for the efficiency of the service, in the limit of sufficiently large x when the vehicles are not idle. The above calculations directly transfer to a system with a larger fleet B > 1 under the assumption that the vehicles are equidistantly distributed. At constant x, the interval between two vehicles arriving decreases by a factor B and the request rate increases by a factor B, resulting in the same average number x of requests per service interval. Thus, since the number of queued customers does not change, the average number of scheduled customers per vehicle becomes ⟨C⟩ = x + 2 ⟨q⟩ /B. The efficiency consequently follows [Eq. (6) in the main manuscript] The assumption of equidistant vehicles does not hold exactly in practice. If a single vehicle is delayed, a large number of customers making a request in the interval until the delayed vehicle arrives experience an increase of the waiting time. Fewer customers requesting a ride in the time interval until the next vehicle arrives experience a shorter waiting time. Overall, the average waiting time increases. Consequently, any deviation from an equidistant distribution of vehicles results in lower efficiency than predicted. This calculation underlies the results presented in Fig. 4a in the main manuscript

B. Estimation of p delay
The queueing theory description also provides a way to compute the probability p delay that a request is delayed due to the capacity constraints. We again consider only a single node due to the symmetry of the dynamics and denote the four relevant discrete random variables, measured when a vehicle arrives at the node, as follows • Z denotes the number of newly scheduled customers since last service.
• D denotes the number of customers (out of the Z new ones) which cannot be served by the next bus arriving because the capacity constraint would be violated.
• Q denotes the length of the queue just before the bus arrives at the stop.
• Q ′ denotes the queue length just before the previous bus arrived at the stop.
p delay is then defined as the fraction of delayed requests, where E(·) denotes the expectation value. The number of newly arriving customers Z is a Poisson random variable with expected value E(Z) = λ ⟨l⟩ /(vB) = x, assuming an equidistant distribution of vehicles as above.
In order to compute the expectation value E(D) of the number of delayed customers, we compute the marginal probability mass function P (D = d) as the sum over the joint distribution for all possible values of Z, Q and Q ′ The last probability P (Q ′ = q ′ ) is directly given by the equilibrium queue length distribution p q ′ Eq. (S1) The number of arriving customers Z is independent of the current state of the queue such that If the newly arriving customers Z and the previous queue length Q ′ are known, Q follows deterministically. The previous vehicle picked up up to θ customers from the Q ′ waiting customers and Z new customers arrived (compare Eq. (S2). We thus have customers in the queue and the probability reduces to The number D of delayed customers follows similarly in three cases: customers are going to be delayed to a later vehicles.
ii) If θ < Q ′ < 2θ, there are Q ′ − θ < θ customers remaining in the queue after the previous vehicle leaves that will be picked up by the next vehicle. From the newly arrived Z requests, θ − (Q ′ − θ) will also be served, whereas iii) If Q ′ ≥ 2θ, only requests which where in the queue previously are served by the next vehicle and all the new requests are delayed, D = Z.
The relevant conditional probability for delaying customers follows as With Eq. (S11), splitting the summation over q ′ into the three distinct cases yields Eliminating all terms with d = 0 by adjusting the z bounds and evaluating the sum over d yields For each constellation of (z, q ′ ) there is exactly one q within the summation bounds that satisfies δ q,· = 1 in each term such that Replacing the infinite sum in the last term using the normalization condition ∞ q ′ =0 p q ′ = 1 simplifies the expression such that only the first probabilities p q ′ for q ′ ≤ 2θ − 1 are required to evaluate the expression. To evaluate this expression numerically, we cut off the summation over z at z max = 50 (compared to typical values of θ and x less than ten) because of the sharp decay of the Poisson probability k z . This calculation underlies the analytical results presented in Fig. 4b in the main manuscript.

II. SUPPLEMENTARY NOTE 2: MEAN FIELD QUEUEING THEORY FOR ARBITRARY NETWORKS
The general idea of the queueing theoretical calculations above can be extended to arbitrary networks with a mean field approach. Assuming the queueing dynamics at all nodes and all vehicles are effectively identical, we can map the above calculation to arbitrary networks with effective parameters. The more heterogeneous the setting in terms of network topology or demand distribution, the larger the deviations from these mean field assumptions.
There are three main differences to the minimal model calculations: • A single vehicle receives λ/B = xv/ ⟨l⟩ requests per time interval. With the average distance between nodes ⟨e⟩ (mean edge length), the vehicle receives on average x ⟨e⟩ / ⟨l⟩ new requests between stops. For large fleet sizes and sufficiently low delay-probability p delay , only requests originating at the next stop of the vehicle will be assigned to it. Thus, the expected number E(Z) of newly arrived customers at the next node on its route is where x eff denotes the effective load parameter for the mean field calculations.
• Additionally, in large networks, the inter-arrival times between vehicles at a node are not identical. The interarrival time distribution is often more similar to an exponential distribution, reflecting a large number of (almost) independent shortest paths along which vehicles can arrive at a node. We include this variable inter-arrival time by modifying the probability P (Z = z) with an integral over all possible inter-arrival times ∆t The following calculations are independent of the exact choice of the inter-arrival time distributions as it only enters via P (Z = z).
• Finally, not all passengers are dropped of at every stop such that vehicles do not always have its full capacity θ available for the new requests. We thus have to track the occupancy statistics of the vehicles in addition to the queue length statistics at the node.
Let O denote an additional random variable describing the occupancy of a vehicle at a node immediately after it has dropped of all passengers. The vehicle then has θ − O seats available for requests from that node. Let p with entries p q denote the probability to observe a queue length q (as above) and π with entries π o denote the probability to observe an occupancy o. Similarly to Eq. (S1) above, we assume a steady state where both probability distributions fulfill the fixed point equations p = P p π = Π π . (S23) Note that these equations are coupled since the occupancy depends on the number of queued customers and the number of queued (and delayed) customers depends on the occupancy. To compute the transition probabilities, we neglect correlations between observables that go beyond one service interval. The transition probabilities P for the queue lengths follow similar to the minimal model. We define the relevant random variables as above: • Z denotes the number of newly arrived customers since last service.
• Q denotes the length of the queue just before the vehicle arrives at the stop.
• Q ′ denotes the queue length just before the previous vehicle arrived at the stop.
• O ′ denotes the occupancy of the last vehicle that arrived at the node.
The transitions probability is given as the marginal probability We readily insert and where the latter equation implicitly assumes that there is no correlation between the occupancy of the previous vehicle and the queue length at that time. With given O ′ , Q ′ and z, the queue length Q is deterministic as resulting in The transition probabilities Π of the occupancy are more complex and require the estimation of how many customers leave the vehicle at the stop. Since we cannot track individual customers, we introduce two new random variables, only tracking one step explicitly and treating all customers who drive longer than one stop identically: • L 1 denotes the number of customers that were picked up at the last stop and are dropped off at the current stop.
• L ∞ denotes the number of customers in the vehicle for more than one stop that are dropped off at the current stop.
We again write the transition probability as the marginal probability Similar to the above calculation, we take P ( Since there is no difference between customers, both L 1 and L ∞ follow Binomial distributions B(l 1 ; min [q ′ , θ − o ′ ] , p 1 ) and B(l ∞ ; o ′ , p ∞ ), respectively, where the second argument described the total number of customers that could be dropped off (i.e. that were picked up at the last node or that remained in the vehicle) and the third argument denotes topology-dependent drop-off probabilities measured from simulations. Alternatively, estimates for these probabilities could be obtained by counting neighboring nodes, assuming customers travel along shortest paths in the limit of large fleet size.
Given all other quantities, the occupancy follows deterministicly as Inserting these probabilities into Eq. (S29) and evaluating the δ operators results in the final expression We numerically compute the distributions p and π by iterating Eq. (S23) with the transitions proabbilities Eq. (S28) and (S31) for 100 times starting from a random initial distribution. We re-normalize the distributions in each step such that they describes a mean occupancy x, accurate in the limit of large fleet sizes and high efficiencies. To facilitate the numerical implementation, we cut off the queue length distribution at q max = 50. The distribution of the occupancy is naturally bounded by the capacity θ.
A. Estimation of p delay Following the same approach as above, we compute p delay via from Eq. (S21) and (S35) Substituting all conditional probabilities and evaluating the sum analogously to the calculation in the minimal model, we arrive at the estimate which we evaluate as the minimal model results numerically. This calculation with a deterministic inter-arrival time distribution (Poisson distributed new arrivals Z) underlies the results presented in the inset of Fig. 4b in the main manuscript. A comparison between estimations with equidistant arrivals and exponentially distributed inter-arrival times is shown in Fig. S1. In order to compare a constrained system with its unconstrained equivalent, we calculate the effective fleet size B eff = (1 − p delay ) B. To compute E(G, B eff , x, θ = ∞) at potentially non-integer values B eff , we interpolate between efficiencies as follows.
For each (G, x), we select several fleet sizes B ′ and find the corresponding efficiencies E(G, B ′ , x, θ = ∞) by simulation. In search of a regression function E ∼ ∞ (G, · , x, θ = ∞) that fits this data, a multilayer perceptron with hidden layers of sizes (5, 10, 5) and a tanh activation function has been trained with an L2 regularization using scikit-learn. The values on the vertical axis of Fig. ??b are the outcomes of inserting B or B eff into this function E ∼ ∞ . While this approach is not strictly necessary to interpolate the efficiency function, it has the additional advantage of compensating statistical fluctuations from the measured efficiencies.
To estimate the required fleet sizes, we fix the graph G, the vehicle capacity θ, the vehicle velocity v and the request rate λ. We compute an estimate of the universal scaling function for various loads x via regression of all model topologies using a neural network of four hidden layers of sizes (80, 20, 10, 5) and a tanh activation function. We start with an estimate B req and the resulting estimate for the delay probability p delay to compute the efficiency via the regression of the universal scaling function. Since the load x changes as we vary the fleet size, we interpolate linearly between values of the scaling parameter B 1/2 (T , x) if required. We vary the fleet size B req until the predicted efficiency is equal to the target efficiency E tar .
[1] N. T. J. Bailey, On queueing processes with bulk service, J. R. Stat. Soc. B 16, 80 (1954). Supplementary Figure S1. Estimation of the delay probability p delay . Black dots represent estimates of p delay assuming equidistant arrivals of vehicles, gray dots represent the same estimate assuming an exponential inter-arrival time distribution. See Methods in the main manuscript for details on the settings and simulations.